simple function
Random deep neural networks are biased towards simple functions
We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2π log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped randomly, the average number of flips required to change the classification grows linearly with n. These results are confirmed by numerical experiments on deep neural networks with two hidden layers, and settle the conjecture stating that random deep neural networks are biased towards simple functions. This conjecture was proposed and numerically explored in [Valle Pérez et al., ICLR 2019] to explain the unreasonably good generalization properties of deep learning algorithms. The probability distribution of the functions generated by random deep neural networks is a good choice for the prior probability distribution in the PAC-Bayesian generalization bounds. Our results constitute a fundamental step forward in the characterization of this distribution, therefore contributing to the understanding of the generalization properties of deep learning algorithms.
Fisher-Bingham-like normalizing flows on the sphere
A generic D-dimensional Gaussian can be conditioned or projected onto the D-1 unit sphere, thereby leading to the well-known Fisher-Bingham (FB) or Angular Gaussian (AG) distribution families, respectively. These are some of the most fundamental distributions on the sphere, yet cannot straightforwardly be written as a normalizing flow except in two special cases: the von-Mises Fisher in D=3 and the central angular Gaussian in any D. In this paper, we describe how to generalize these special cases to a family of normalizing flows that behave similarly to the full FB or AG family in any D. We call them "zoom-linear-project" (ZLP)-Fisher flows. Unlike a normal Fisher-Bingham distribution, their composition allows to gradually add complexity as needed. Furthermore, they can naturally handle conditional density estimation with target distributions that vary by orders of magnitude in scale - a setting that is important in astronomical applications but that existing flows often struggle with. A particularly useful member of the new family is the Kent analogue that can cheaply upgrade any flow in this situation to yield better performance.
We certainly agree that it would be great to have a similar
The current analysis represents an important first step. This paper shows one successful "path" for unimprovable This is an important first step for addressing more complex distributions. The current paper's analysis is rather difficult. It would be unreasonable to add additional content. Still, no reasonable person could call these results "straightforward" and we kindly ask that this be We address this in Section 5.3, though more elaboration may be helpful. As we discuss in Sec.
Reviews: Random deep neural networks are biased towards simple functions
Summary: If we assume the inputs of DNNs to be binary (e.g. This theoretical result suggests that DNNs produce functions with low Kolmogorov complexity, which is useful for studying generalization bounds of DNNs. Some experiments on random data and on tiny nets on MNIST (in the supplement) are presented, empirically verifying the bounds. I tend to weakly reject this paper due to the weakness of the empirical and theoretical results, and on the organization of the paper (MNIST results in main text?). Given result 2), I'd strongly suggest that the authors think about what might be possible from an adversarial setting.
Reviews: Random deep neural networks are biased towards simple functions
The reviewers agreed that this was a very interesting submission, well-written and elegant, with a significant theoretical advance in our understanding of the effectiveness of neural networks. This advance nicely builds on previous empirical work by bringing theory to bear on explaining phenomena previously only demonstrated experimentally.
Random deep neural networks are biased towards simple functions
We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2π log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped randomly, the average number of flips required to change the classification grows linearly with n. These results are confirmed by numerical experiments on deep neural networks with two hidden layers, and settle the conjecture stating that random deep neural networks are biased towards simple functions. This conjecture was proposed and numerically explored in [Valle Pérez et al., ICLR 2019] to explain the unreasonably good generalization properties of deep learning algorithms.
A comparative study of conformal prediction methods for valid uncertainty quantification in machine learning
In the past decades, most work in the area of data analysis and machine learning was focused on optimizing predictive models and getting better results than what was possible with existing models. To what extent the metrics with which such improvements were measured were accurately capturing the intended goal, whether the numerical differences in the resulting values were significant, or whether uncertainty played a role in this study and if it should have been taken into account, was of secondary importance. Whereas probability theory, be it frequentist or Bayesian, used to be the gold standard in science before the advent of the supercomputer, it was quickly replaced in favor of black box models and sheer computing power because of their ability to handle large data sets. This evolution sadly happened at the expense of interpretability and trustworthiness. However, while people are still trying to improve the predictive power of their models, the community is starting to realize that for many applications it is not so much the exact prediction that is of importance, but rather the variability or uncertainty. The work in this dissertation tries to further the quest for a world where everyone is aware of uncertainty, of how important it is and how to embrace it instead of fearing it. A specific, though general, framework that allows anyone to obtain accurate uncertainty estimates is singled out and analysed. Certain aspects and applications of the framework -- dubbed `conformal prediction' -- are studied in detail. Whereas many approaches to uncertainty quantification make strong assumptions about the data, conformal prediction is, at the time of writing, the only framework that deserves the title `distribution-free'. No parametric assumptions have to be made and the nonparametric results also hold without having to resort to the law of large numbers in the asymptotic regime.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
- Asia > Middle East > Jordan (0.04)
- Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Transportation (1.00)
- Health & Medicine (1.00)
- Education > Educational Setting (1.00)
Points of non-linearity of functions generated by random neural networks
We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function. We assume that the parameters of the neural network are chosen uniformly at random with respect to various probability distributions, and compute the expected distribution of the points of non-linearity. We use these results to explain why the network may be biased towards outputting functions with simpler geometry, and why certain functions with low information-theoretic complexity are nonetheless hard for a neural network to approximate.
- Europe > Netherlands > South Holland > Leiden (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Breaking it Down: Gradient Descent
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Gradient descent is an optimization algorithm that is used to improve the performance of deep/machine learning models.
Explaining Any ML Model? -- On Goals and Capabilities of XAI
Renftle, Moritz, Trittenbach, Holger, Poznic, Michael, Heil, Reinhard
An increasing ubiquity of machine learning (ML) motivates research on algorithms to explain ML models and their predictions -- so-called eXplainable Artificial Intelligence (XAI). Despite many survey papers and discussions, the goals and capabilities of XAI algorithms are far from being well understood. We argue that this is because of a problematic reasoning scheme in XAI literature: XAI algorithms are said to complement ML models with desired properties, such as "interpretability", or "explainability". These properties are in turn assumed to contribute to a goal, like "trust" in an ML system. But most properties lack precise definitions and their relationship to such goals is far from obvious. The result is a reasoning scheme that obfuscates research results and leaves an important question unanswered: What can one expect from XAI algorithms? In this article, we clarify the goals and capabilities of XAI algorithms from a concrete perspective: that of their users. Explaining ML models is only necessary if users have questions about them. We show that users can ask diverse questions, but that only one of them can be answered by current XAI algorithms. Answering this core question can be trivial, difficult or even impossible, depending on the ML application. Based on these insights, we outline which capabilities policymakers, researchers and society can reasonably expect from XAI algorithms.
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Overview (0.68)
- Research Report (0.50)